Data Visualization

Sharp analysis of power iteration for tensor PCA

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sharp analysis of power iteration for tensor PCA Yuchen Wu , Kangjie Zhou 25(195 1 42, 2024. Abstract We investigate the power iteration algorithm for the tensor PCA model introduced in Richard and Montanari 2014 Previous work studying the properties of tensor power iteration is either limited to a constant number of iterations , or requires a non-trivial data-independent initialization . In this paper , we move beyond these limitations and analyze the dynamics of randomly initialized tensor power iteration up to polynomially many steps . Our contributions are threefold : First , we establish sharp

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us BenchMARL : Benchmarking Multi-Agent Reinforcement Learning Matteo Bettini , Amanda Prorok , Vincent Moens 25(217 1 10, 2024. Abstract The field of Multi-Agent Reinforcement Learning MARL is currently facing a reproducibility crisis . While solutions for standardized reporting have been proposed to address the issue , we still lack a benchmarking tool that enables standardization and reproducibility , while leveraging cutting-edge Reinforcement Learning RL implementations . In this paper , we introduce BenchMARL , the first MARL training library created to enable standardized benchmarking across

Optimal Locally Private Nonparametric Classification with Public Data

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Optimal Locally Private Nonparametric Classification with Public Data Yuheng Ma , Hanfang Yang 25(167 1 62, 2024. Abstract In this work , we investigate the problem of public data assisted non-interactive Local Differentially Private LDP learning with a focus on non-parametric classification . Under the posterior drift assumption , we for the first time derive the mini-max optimal convergence rate with LDP constraint . Then , we present a novel approach , the locally differentially private classification tree , which attains the mini-max optimal convergence rate . Furthermore , we design a

Split Conformal Prediction and Non-Exchangeable Data

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Split Conformal Prediction and Non-Exchangeable Data Roberto I . Oliveira , Paulo Orenstein , Thiago Ramos , João Vitor Romano 25(225 1 38, 2024. Abstract Split conformal prediction CP is arguably the most popular CP method for uncertainty quantification , enjoying both academic interest and widespread deployment . However , the original theoretical analysis of split CP makes the crucial assumption of data exchangeability , which hinders many real-world applications . In this paper , we present a novel theoretical framework based on concentration inequalities and decoupling properties of the data ,

Bayesian Regression Markets

Updated: 2024-09-30 23:34:30

Although machine learning tasks are highly sensitive to the quality of input data, relevant datasets can often be challenging for firms to acquire, especially when held privately by a variety of owners. For instance, if these owners are competitors in a downstream market, they may be reluctant to share information. Focusing on supervised learning for regression tasks, we develop a regression market to provide a monetary incentive for data sharing. Our mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in literature expose the market agents to sizeable financial risks, which can be mitigated in our setup.

Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspective

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Distribution Learning via Neural Differential Equations : A Nonparametric Statistical Perspective Youssef Marzouk , Zhi Robert Ren , Sven Wang , Jakob Zech 25(232 1 61, 2024. Abstract Ordinary differential equations ODEs via their induced flow maps , provide a powerful framework to parameterize invertible transformations for representing complex probability distributions . While such models have achieved enormous success in machine learning , little is known about their statistical properties . This work establishes the first general nonparametric statistical convergence analysis for distribution

Parallel-in-Time Probabilistic Numerical ODE Solvers

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Parallel-in-Time Probabilistic Numerical ODE Solvers Nathanael Bosch , Adrien Corenflos , Fatemeh Yaghoobi , Filip Tronarp , Philipp Hennig , Simo Särkkä 25(206 1 27, 2024. Abstract Probabilistic numerical solvers for ordinary differential equations ODEs treat the numerical simulation of dynamical systems as problems of Bayesian state estimation . Aside from producing posterior distributions over ODE solutions and thereby quantifying the numerical approximation error of the method itself , one less-often noted advantage of this formalism is the algorithmic flexibility gained by formulating numerical

Variance estimation in graphs with the fused lasso

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Variance estimation in graphs with the fused lasso Oscar Hernan Madrid Padilla 25(250 1 45, 2024. Abstract We study the problem of variance estimation in general graph-structured problems . First , we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs . We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has total variation with canonical scaling . Furthermore , we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs

Random measure priors in Bayesian recovery from sketches

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Random measure priors in Bayesian recovery from sketches Mario Beraha , Stefano Favaro , Matteo Sesia 25(249 1 53, 2024. Abstract This paper introduces a Bayesian nonparametric approach to frequency recovery from lossy-compressed discrete data , leveraging all information contained in a sketch obtained through random hashing . By modeling the data points as random samples from an unknown discrete distribution endowed with a Poisson-Kingman prior , we derive the posterior distribution of a symbol's empirical frequency given the sketch . This leads to principled frequency estimates through mean

Multi-Objective Neural Architecture Search by Learning Search Space Partitions

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multi-Objective Neural Architecture Search by Learning Search Space Partitions Yiyang Zhao , Linnan Wang , Tian Guo 25(177 1 41, 2024. Abstract Deploying deep learning models requires taking into consideration neural network metrics such as model size , inference latency , and FLOPs , aside from inference accuracy . This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria . However , applying multi-objective optimizations to neural architecture search NAS is nontrivial because NAS tasks usually have a huge

From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us From continuous-time formulations to discretization schemes : tensor trains and robust regression for BSDEs and parabolic PDEs Lorenz Richter , Leon Sallandt , Nikolas Nüsken 25(248 1 40, 2024. Abstract The numerical approximation of partial differential equations PDEs poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality . Recent attempts rely on a combination of Monte Carlo methods and variational formulations , using neural networks for function approximation . Extending previous work Richter et al . 2021 we argue that

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks Tian-Yi Zhou , Xiaoming Huo 25(190 1 54, 2024. Abstract This paper studies the binary classification of unbounded data from mathbb R d$ generated under Gaussian Mixture Models GMMs using deep ReLU neural networks . We obtain for the first time non-asymptotic upper bounds and convergence rates of the excess risk excess misclassification error for the classification without restrictions on model parameters . While the majority of existing generalization analysis of classification algorithms relies on a bounded domain ,

Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

Updated: 2024-09-30 23:34:30

: , , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Fermat Distances : Metric Approximation , Spectral Convergence , and Clustering Algorithms Nicolás García Trillos , Anna Little , Daniel McKenzie , James M . Murphy 25(176 1 65, 2024. Abstract We analyze the convergence properties of Fermat distances , a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure . Fermat distances may be defined either on discrete samples from the underlying measure , in which case they are random , or in the continuum setting , where they are induced by geodesics under a density-distorted Riemannian metric . We

Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks Yunfei Yang , Ding-Xuan Zhou 25(165 1 35, 2024. Abstract It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence up to logarithmic factors for learning functions from certain smooth function classes , if the weights are suitably constrained or regularized . Specifically , we consider the nonparametric regression of estimating an unknown d$-variate function by using shallow ReLU neural networks . It is assumed that the regression function is from the H older space with smoothness

Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data Shuo-Chieh Huang , Ruey S . Tsay 25(205 1 59, 2024. Abstract Feature-distributed data , referred to data partitioned by features and stored across multiple computing nodes , are increasingly common in applications with a large number of features . This paper proposes a two-stage relaxed greedy algorithm TSRGA for applying multivariate linear regression to such data . The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension , making it highly scalable to very large

Sparse Graphical Linear Dynamical Systems

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sparse Graphical Linear Dynamical Systems Emilie Chouzenoux , Victor Elvira 25(223 1 53, 2024. Abstract Time-series datasets are central in machine learning with applications in numerous fields of science and engineering , such as biomedicine , Earth observation , and network analysis . Extensive research exists on state-space models SSMs which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series . Learning the model parameters in SSMs is arguably one of the most complicated tasks , and the inclusion of prior knowledge is known to both ease the

FineMorphs: Affine-Diffeomorphic Sequences for Regression

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us FineMorphs : Affine-Diffeomorphic Sequences for Regression Michele Lohr , Laurent Younes 25(245 1 38, 2024. Abstract A multivariate regression model of affine and diffeomorphic transformation sequences—FineMorphs—is presented . Leveraging concepts from shape analysis , model states are optimally reshaped by diffeomorphisms generated by smooth vector fields during learning . Affine transformations and vector fields are optimized within an optimal control setting , and the model can naturally reduce or increase dimensionality and adapt to large data sets via sub-optimal vector fields . An existence

Interpretable algorithmic fairness in structured and unstructured data

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Interpretable algorithmic fairness in structured and unstructured data Hari Bandi , Dimitris Bertsimas , Thodoris Koukouvinos , Sofie Kupiec 25(215 1 42, 2024. Abstract Systemic bias with respect to gender and race is prevalent in datasets , making it challenging to train classification models that are accurate and alleviate bias . We propose a unified method for alleviating bias in structured and unstructured data , based on a novel optimization approach for optimally flipping outcome labels and training classification models simultaneously . In the case of structured data , we introduce constraints

FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us FedCBO : Reaching Group Consensus in Clustered Federated Learning through Consensus-based Optimization José A . Carrillo , Nicolás García Trillos , Sixu Li , Yuhua Zhu 25(214 1 51, 2024. Abstract Federated learning is an important framework in modern machine learning that seeks to integrate the training of learning models from multiple users , each user having their own local data set , in a way that is sensitive to data privacy and to communication loss constraints . In clustered federated learning , one assumes an additional unknown group structure among users , and the goal is to train models

Tensor-train methods for sequential state and parameter learning in state-space models

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Tensor-train methods for sequential state and parameter learning in state-space models Yiran Zhao , Tiangang Cui 25(244 1 51, 2024. Abstract We consider sequential state and parameter learning in state-space models with intractable state transition and observation processes . By exploiting low-rank tensor train TT decompositions , we propose new sequential learning methods for joint parameter and state estimation under the Bayesian framework . Our key innovation is the introduction of scalable function approximation tools such as TT for recursively learning the sequentially updated posterior

Differentially Private Topological Data Analysis

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Differentially Private Topological Data Analysis Taegyu Kang , Sehwan Kim , Jinwon Sohn , Jordan Awan 25(189 1 42, 2024. Abstract This paper is the first to attempt differentially private DP topological data analysis TDA producing near-optimal private persistence diagrams . We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance , and we show that the commonly used Cech complex has sensitivity that does not decrease as the sample size n$ increases . This makes it challenging for the persistence diagrams of Cech complexes to be privatized . As an alternative , we show

Spherical Rotation Dimension Reduction with Geometric Loss Functions

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spherical Rotation Dimension Reduction with Geometric Loss Functions Hengrui Luo , Jeremy E . Purvis , Didong Li 25(175 1 55, 2024. Abstract Modern datasets often exhibit high dimensionality , yet the data reside in low-dimensional manifolds that can reveal underlying geometric structures critical for data analysis . A prime example of such a dataset is a collection of cell cycle measurements , where the inherently cyclical nature of the process can be represented as a circle or sphere . Motivated by the need to analyze these types of datasets , we propose a nonlinear dimension reduction method ,

Efficient Convex Algorithms for Universal Kernel Learning

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Efficient Convex Algorithms for Universal Kernel Learning Aleksandr Talitckii , Brendon Colbert , Matthew M . Peet 25(203 1 40, 2024. Abstract The accuracy and complexity of machine learning algorithms based on kernel optimization are determined by the set of kernels over which they are able to optimize . An ideal set of kernels should : admit a linear parameterization for tractability be dense in the set of all kernels for robustness be universal for accuracy Recently , a framework was proposed for using positive matrices to parameterize a class of positive semi-separable kernels . Although this

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data

Updated: 2024-09-30 23:34:30

, , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Copula Models for Multivariate , Mixed , and Missing Data Joseph Feldman , Daniel R . Kowal 25(164 1 50, 2024. Abstract Modern data sets commonly feature both substantial missingness and many variables of mixed data types , which present significant challenges for estimation and inference . Complete case analysis , which proceeds using only the observations with fully-observed variables , is often severely biased , while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies among possibly many variables of mixed data types .

Euler Characteristic Tools for Topological Data Analysis

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Euler Characteristic Tools for Topological Data Analysis Olympio Hacquard , Vadim Lebovici 25(240 1 39, 2024. Abstract In this article , we study Euler characteristic techniques in topological data analysis . Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile . We show that this simple descriptor achieves state-of-the-art performance in supervised tasks at a meagre computational cost . Inspired by signal analysis , we compute hybrid transforms of Euler characteristic profiles . These integral

Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model Ning Wang , Xin Zhang , Qing Mai 25(222 1 85, 2024. Abstract The expectation-maximization EM algorithm and its variants are widely used in statistics . In high-dimensional mixture linear regression , the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size . The standard EM algorithm , which attempts to find the maximum likelihood estimator , becomes infeasible for such model . We devise a group lasso penalized EM algorithm and

An Algorithmic Framework for the Optimization of Deep Neural Networks Architectures and Hyperparameters

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Algorithmic Framework for the Optimization of Deep Neural Networks Architectures and Hyperparameters Julie Keisler , El-Ghazali Talbi , Sandra Claudel , Gilles Cabriel 25(201 1 33, 2024. Abstract In this paper , we propose DRAGON for DiRected Acyclic Graph OptimizatioN an algorithmic framework to automatically generate efficient deep neural networks architectures and optimize their associated hyperparameters . The framework is based on evolving Directed Acyclic Graphs DAGs defining a more flexible search space than the existing ones in the literature . It allows mixtures of different classical

An Analysis of Quantile Temporal-Difference Learning

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Analysis of Quantile Temporal-Difference Learning Mark Rowland , Rémi Munos , Mohammad Gheshlaghi Azar , Yunhao Tang , Georg Ostrovski , Anna Harutyunyan , Karl Tuyls , Marc G . Bellemare , Will Dabney 25(163 1 47, 2024. Abstract We analyse quantile temporal-difference learning QTD a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning . Despite these empirical successes , a theoretical understanding of QTD has proven elusive until now . Unlike classical TD learning , which can be analysed

Fortuna: A Library for Uncertainty Quantification in Deep Learning

Updated: 2024-09-30 23:34:30

We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems.

An Entropy-Based Model for Hierarchical Learning

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Entropy-Based Model for Hierarchical Learning Amir R . Asadi 25(187 1 45, 2024. Abstract Machine learning , the predominant approach in the field of artificial intelligence , enables computers to learn from data and experience . In the supervised learning framework , accurate and efficient learning of dependencies between data instances and their corresponding labels requires auxiliary information about the data distribution and the target function . This central concept aligns with the notion of regularization in statistical learning theory . Real-world datasets are often characterized by

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis Yuanxing Chen , Qingzhao Zhang , Shuangge Ma , Kuangnan Fang 25(211 1 60, 2024. Abstract In diverse fields ranging from finance to omics , it is increasingly common that data is distributed with multiple individual sources referred to as clients” in some studies Integrating raw data , although powerful , is often not feasible , for example , when there are considerations on privacy protection . Distributed learning techniques have been developed to integrate summary statistics as opposed to raw data . In many existing

Individual-centered Partial Information in Social Networks

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Individual-centered Partial Information in Social Networks Xiao Han , Y . X . Rachel Wang , Qing Yang , Xin Tong 25(230 1 60, 2024. Abstract In statistical network analysis , we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the network . However , in a real social network , people frequently make decisions based on their local view of the network alone . Here , we consider a partial information framework that characterizes the local network centered at a given individual by path length L$ and gives rise to a partial

From Small Scales to Large Scales: Distance-to-Measure Density based Geometric Analysis of Complex Data

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us From Small Scales to Large Scales : Distance-to-Measure Density based Geometric Analysis of Complex Data Katharina Proksch , Christoph Alexander Weikamp , Thomas Staudt , Benoit Lelandais , Christophe Zimmer 25(210 1 53, 2024. Abstract How can we tell complex point clouds with different small scale characteristics apart , while disregarding global features Can we find a suitable transformation of such data in a way that allows to discriminate between differences in this sense with statistical guarantees In this paper , we consider the analysis and classification of complex point clouds as they are

Characterization of translation invariant MMD on Rd and connections with Wasserstein distances

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Characterization of translation invariant MMD on Rd and connections with Wasserstein distances Thibault Modeste , Clément Dombry 25(237 1 39, 2024. Abstract Kernel mean embeddings and maximum mean discrepancies MMD associated with positive definite kernels are important tools in machine learning that allow to compare probability measures and sample distributions . We provide a full characterization of translation invariant MMDs on mathbb{R d$ that are parametrized by a spectral measure and a semi-definite positive symmetric matrix . Furthermore , we investigate the connections between translation

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Conformal Inference for Online Prediction with Arbitrary Distribution Shifts Isaac Gibbs , Emmanuel J . Candès 25(162 1 36, 2024. Abstract We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time . Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics . Here , we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width . We achieve this by modifying the adaptive

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Updated: 2024-09-30 23:34:30

, , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Data-driven Automated Negative Control Estimation DANCE Search for , Validation of , and Causal Inference with Negative Controls Erich Kummerfeld , Jaewon Lim , Xu Shi 25(229 1 35, 2024. Abstract Negative control variables are increasingly used to adjust for unmeasured confounding bias in causal inference using observational data . They are typically identified by subject matter knowledge and there is currently a severe lack of data-driven methods to find negative controls . In this paper , we present a statistical test for discovering negative controls of a special type---disconnected negative

PAMI: An Open-Source Python Library for Pattern Mining

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us PAMI : An Open-Source Python Library for Pattern Mining Uday Kiran Rage , Veena Pamalla , Masashi Toyoda , Masaru Kitsuregawa 25(209 1 6, 2024. Abstract Crucial information that can empower users with competitive information to achieve socio-economic development lies hidden in big data . Pattern mining aims to discover this needy information by finding user interest-based patterns in big data . Unfortunately , existing pattern mining libraries are limited to finding a few types of patterns in transactional and sequence databases . This paper tackles this problem by providing a cross-platform

Unsupervised Tree Boosting for Learning Probability Distributions

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unsupervised Tree Boosting for Learning Probability Distributions Naoki Awaya , Li Ma 25(198 1 52, 2024. Abstract We propose an unsupervised tree boosting algorithm for inferring the underlying sampling distribution of an i.i.d . sample based on fitting additive tree ensembles in a manner analogous to supervised tree boosting . Integral to the algorithm is a new notion of addition on probability distributions that leads to a coherent notion of residualization i.e . subtracting a probability distribution from an observation to remove the distributional structure from the sampling distribution of the

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression Youngseok Kim , Wei Wang , Peter Carbonetto , Matthew Stephens 25(185 1 59, 2024. Abstract We introduce a new empirical Bayes approach for large-scale multiple linear regression . Our approach combines two key ideas : i the use of flexible adaptive shrinkage priors , which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions and ii the use of variational approximations to efficiently estimate prior hyperparameters and compute

Linear Regression With Unmatched Data: A Deconvolution Perspective

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Linear Regression With Unmatched Data : A Deconvolution Perspective Mona Azadkia , Fadoua Balabdaoui 25(197 1 55, 2024. Abstract Consider the regression problem where the response Y in mathbb{R and the covariate X in mathbb{R d$ for d geq 1$ are unmatched . Under this scenario , we do not have access to pairs of observations from the distribution of X , Y but instead , we have separate data sets Y_i i=1 n_Y and X_j j=1 n_X possibly collected from different sources . We study this problem assuming that the regression function is linear and the noise distribution is known , an assumption that we

Risk Measures and Upper Probabilities: Coherence and Stratification

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Risk Measures and Upper Probabilities : Coherence and Stratification Christian Fröhlich , Robert C . Williamson 25(207 1 100, 2024. Abstract Machine learning typically presupposes classical probability theory which implies that aggregation is built upon expectation . There are now multiple reasons to motivate looking at richer alternatives to classical probability theory as a mathematical foundation for machine learning . We systematically examine a powerful and rich class of alternative aggregation functionals , known variously as spectral risk measures , Choquet integrals or Lorentz norms . We

More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization Xu Liu , Heng Lian , Jian Huang 25(161 1 27, 2024. Abstract We consider parsimonious modeling of high-dimensional multivariate additive models using regression splines , with or without sparsity assumptions . The approach is based on treating the coefficients in the spline expansions as a third-order tensor . Note the data does not have tensor predictors or tensor responses , which distinguishes our study from the existing ones . A Tucker decomposition is used to reduce the number of parameters in

Memory-Efficient Sequential Pattern Mining with Hybrid Tries

Updated: 2024-09-30 23:34:30

This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data structure that exploits recurring patterns to compactly store the data set in memory; and a corresponding mining algorithm designed to effectively extract patterns from this compact representation. Numerical results on small to medium-sized real-life test instances show an average improvement of 85% in memory consumption and 49% in computation time compared to the state of the art. For large data sets, our algorithm stands out as the only capable SPM approach within 256GB of system memory, potentially saving 1.7TB in memory consumption.

Improved Random Features for Dot Product Kernels

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Improved Random Features for Dot Product Kernels Jonas Wacker , Motonobu Kanagawa , Maurizio Filippone 25(235 1 75, 2024. Abstract Dot product kernels , such as polynomial and exponential softmax kernels , are among the most widely used kernels in machine learning , as they enable modeling the interactions between input features , which is crucial in applications like computer vision , natural language processing , and recommender systems . We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels , to make these kernels more useful in

Permuted and Unlinked Monotone Regression in R^d: an approach based on mixture modeling and optimal transport

Updated: 2024-09-30 23:34:30

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Permuted and Unlinked Monotone Regression in R^d : an approach based on mixture modeling and optimal transport Martin Slawski , Bodhisattva Sen 25(183 1 57, 2024. Abstract Suppose that we have a regression problem with response variable Y in mathbb{R d$ and predictor X in mathbb{R d$ , for d ge 1$ . In permuted or unlinked regression we have access to separate unordered data on X$ and Y$ , as opposed to data on X,Y pairs in usual regression . So far in the literature the case d=1$ has received attention , see e.g . the recent papers by Rigollet and Weed Information Inference , 8, 619-717 and

On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing

Updated: 2024-09-30 23:34:30

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing Jiacheng Zhuo , Jeongyeol Kwon , Nhat Ho , Constantine Caramanis 25(169 1 47, 2024. Abstract We consider solving the low-rank matrix sensing problem with the Factorized Gradient Descent FGD method when the specified rank is larger than the true rank . We refer to this as over-parameterized matrix sensing . If the ground truth signal mathbf{X in mathbb{R d times d is of rank r$ , but we try to recover it using mathbf{F mathbf{F top$ where mathbf{F in mathbb{R d times k and kr$ , the existing statistical analysis

Garrett Archer: on the election data frontline in Arizona

Updated: 2024-09-30 23:34:29

In the latest episode of the Data Journalism podcast: Garrett Archer⁠ is the data analyst at ABC15 in Phoenix, Arizona, where he is a data storyteller and one of the foremost experts on Arizona’s election system. As America votes, Garrett will be responsible for reporting the facts in one of the most tightly-fought US elections … Continue reading →

New Real-Life Data Visualization Examples — DataViz Weekly

Updated: 2024-09-30 23:34:25

Each week, DataViz Weekly brings you a curated selection of charts and maps based on real-life data. Welcome to our new roundup, continuing to demonstrate how effective data graphics can truly help make sense of complex topics. Take a look at the new data visualization examples we’ve lately found worth highlighting: The United Kingdom’s coal-free […] The post New Real-Life Data Visualization Examples — DataViz Weekly appeared first on AnyChart News.

Figures Without Any Charts — JS Chart Tips

Updated: 2024-09-30 23:34:25

Sales : 1 888 845-1211 USA or 44 20 7193 9444 Europe customer login Toggle navigation Products AnyChart AnyStock AnyMap AnyGantt Mobile Qlik Extension Features Resources Business Solutions Technical Integrations Chartopedia Tutorials Support Company About Us Customers Success Stories More Testimonials News Download Buy Now Search News Â» JS Chart Tips Â» Figures Without Any Charts â JS Chart Tips Figures Without Any Charts â JS Chart Tips September 23rd , 2024 by AnyChart Team Exploring minimalistic data presentation , this entry of JS Chart Tips shifts focus from complex visualizations to effectivelyÂ showing raw numerical data . While ourÂ JavaScript charting library is designed to enableÂ compelling graphical data displays , sometimes simplicity provides clearer insights .

Discovering Fresh Compelling Visual Data Stories — DataViz Weekly

Updated: 2024-09-30 23:34:25

When properly visualized, data comes to life and reveals the stories hidden within the numbers. In this edition of DataViz Weekly, we showcase a selection of new projects that present data in compelling and insightful ways. Let’s dive into the visual data stories that caught our attention this week. Neglected tropical diseases — Nexo Migrants […] The post Discovering Fresh Compelling Visual Data Stories — DataViz Weekly appeared first on AnyChart News.

Austrian Academy of Sciences (ÖAW) Uses AnyChart JS for Literary Data Visualization

Updated: 2024-09-30 23:34:25

Visualizing data from a literary work can clarify complex structures and patterns, enhancing the understanding of its content and context. For example, a timeline chart can effectively organize events mentioned in the text into an intuitive graphical form, which can be especially beneficial for historical texts. Today, we are pleased to share a project where […] The post Austrian Academy of Sciences (ÖAW) Uses AnyChart JS for Literary Data Visualization appeared first on AnyChart News.

New Interesting Data Visualizations to Explore — DataViz Weekly

Updated: 2024-09-30 23:34:25

Another week, another collection of new data visualizations! Check out some of the most interesting examples we’ve discovered recently, curated for DataViz Weekly. Predicting the outcome of the 2024 U.S. presidential election — NBC News Impact of air alerts on Kyiv’s public transport — Text.org.ua U.S. fall foliage in 2024 — SmokyMountains.com Child mortality due […] The post New Interesting Data Visualizations to Explore — DataViz Weekly appeared first on AnyChart News.

JavaScript Pie Chart with Radial Scale — JS Chart Tips

Updated: 2024-09-30 23:34:25

Welcome to JS Chart Tips, our new blog series where we showcase practical solutions to common and unique challenges our Support Team has helped customers overcome. This time, we’re eager to explain how to build a sophisticated circular diagram that may resemble a pie chart with a radial scale. Just a heads-up: this type of visualization […] The post JavaScript Pie Chart with Radial Scale — JS Chart Tips appeared first on AnyChart News.

✚ Visualization Tools, Datasets, and Resources – September 2024 Roundup

Updated: 2024-09-30 23:34:21

, , About Projects Courses Tutorials Newsletter Membership Log in Members Only Visualization Tools , Datasets , and Resources September 2024 Roundup September 26, 2024 Topic The Process roundup Hi , it’s Nathan . This is The Process the newsletter for FlowingData members . Every month , I collect tools , datasets , and resources to make better charts . Here’s the good stuff for September 2024. To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I publish every Thursday . Get it in your inbox or read it on FlowingData . You also gain unlimited access to hundreds of hours worth of step-by-step visualization courses and

Data Visualization

Exploring ways to display data

Current Feed Items | Previous Months ItemsAug 2024 | Jul 2024 | Jun 2024 | May 2024 | Apr 2024 | Mar 2024

Current Feed Items | Previous Months Items

Get Feed

Sources

46 - JMLR

6 - AnyChart News

1 - Simon Rogers

1 - FlowingData

Current Feed Items | Previous Months Items
Aug 2024 | Jul 2024 | Jun 2024 | May 2024 | Apr 2024 | Mar 2024